76 research outputs found

    Spectral embedding finds meaningful (relevant) structure in image and microarray data

    Get PDF
    BACKGROUND: Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented. RESULTS: We present the results of applying the spectral method of Lafon, a nonlinear DR method based on the weighted graph Laplacian, that minimizes the requirements for such parameter optimization for two biological data types. We demonstrate that it is successful in determining implicit ordering of brain slice image data and in classifying separate species in microarray data, as compared to two conventional linear methods and three nonlinear methods (one of which is an alternative spectral method). This spectral implementation is shown to provide more meaningful information, by preserving important relationships, than the methods of DR presented for comparison. Tuning parameter fitting is simple and is a general, rather than data type or experiment specific approach, for the two datasets analyzed here. Tuning parameter optimization is minimized in the DR step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons. CONCLUSION: Results from the spectral method presented here exhibit the desirable properties of preserving meaningful nonlinear relationships in lower dimensional space and requiring minimal parameter fitting, providing a useful algorithm for purposes of visualization and classification across diverse datasets, a common challenge in systems biology

    An online database for brain disease research

    Get PDF
    BACKGROUND: The Stanley Medical Research Institute online genomics database (SMRIDB) is a comprehensive web-based system for understanding the genetic effects of human brain disease (i.e. bipolar, schizophrenia, and depression). This database contains fully annotated clinical metadata and gene expression patterns generated within 12 controlled studies across 6 different microarray platforms. DESCRIPTION: A thorough collection of gene expression summaries are provided, inclusive of patient demographics, disease subclasses, regulated biological pathways, and functional classifications. CONCLUSION: The combination of database content, structure, and query speed offers researchers an efficient tool for data mining of brain disease complete with information such as: cross-platform comparisons, biomarkers elucidation for target discovery, and lifestyle/demographic associations to brain diseases

    Use of type I interferon-inducible mRNAs as pharmacodynamic markers and potential diagnostic markers in trials with sifalimumab, an anti-IFNα antibody, in systemic lupus erythematosus

    Get PDF
    Type I interferons are implicated in the pathogenesis of systemic lupus erythematosus (SLE). Type I interferon-inducible mRNAs are widely and concordantly overexpressed in the periphery and involved tissues of a subset of SLE patients, and provide utility as pharmacodynamic biomarkers to aid dose selection, as well as potential indicators of patients who might respond favorably to anti-IFNα therapy in SLE. We implemented a three-tiered approach to identify a panel of type I interferon-inducible mRNAs to be used as potential pharmacodynamic biomarkers to aid dose selection in clinical trials of sifalimumab, an anti-IFNα monoclonal antibody under development for the treatment of SLE. In a single-dose escalation phase 1 trial, we observed a sifalimumab-specific and dose-dependent inhibition of the overexpression of type I interferon-inducible mRNAs in the blood of treated subjects. Inhibition of expression of type I interferon-inducible mRNAs and proteins was also observed in skin lesions of SLE subjects from the same trial. Inhibiting IFNα resulted in a profound downstream effect in these SLE subjects that included suppression of mRNAs of B-cell activating factor belonging to the TNF family and the signaling pathways of TNFα, IL-10, IL-1β, and granulocyte-macrophage colony-stimulating factor in both the periphery and skin lesions. A scoring method based on the expression of type I interferon-inducible mRNAs partitioned SLE patients into two distinct subpopulations, which suggests the possibility of using these type I interferon-inducible genes as predictive biomarkers to identify SLE patients who might respond more favorably to anti-type I interferon therapy

    Population Density, Group Size or Something in Between: Effects of a Variable Social Structure on Parasite Transmission

    Get PDF
    Critical to our understanding of disease dynamics and effective disease control strategies is the relationship between host density and parasite transmission rates. To accurately describe this relationship, it is important to measure host density at the scale in which transmission is occurring. In social species, for example, transmission may be more related to group size than the population as a whole. But when aggregation patterns vary in size across space and time, our ability to quantify the density-transmission relationship may depend on measuring density somewhere in between population density and group size. To address this issue, we examined elk (Cervus elaphus) populations in western Wyoming that have been exposed to the bacteria (Brucella abortus) that causes brucellosis. We measured elk density at multiple scales ranging from population density to group size, and evaluated the functional relationship between density and brucellosis seroprevalence. Our study found that low elk density did not explain why Brucella had not effectively invaded several populations. However, in populations with multiple years of seropositive test results, the rates of increase in seroprevalence saturate with increasing elk density regardless of the density measure used. The different densities were poorly correlated with one another, and therefore high elk densities at broad scales did not guarantee high elk densities at fine scales, but both may be important to the transmission of Brucella. This suggests that reducing or altering elk density may not effectively reduce transmission

    Modeling the effects of a Staphylococcal Enterotoxin B (SEB) on the apoptosis pathway

    Get PDF
    BACKGROUND: The lack of detailed understanding of the mechanism of action of many biowarfare agents poses an immediate challenge to biodefense efforts. Many potential bioweapons have been shown to affect the cellular pathways controlling apoptosis [1-4]. For example, pathogen-produced exotoxins such as Staphylococcal Enterotoxin B (SEB) and Anthrax Lethal Factor (LF) have been shown to disrupt the Fas-mediated apoptotic pathway [2,4]. To evaluate how these agents affect these pathways it is first necessary to understand the dynamics of a normally functioning apoptosis network. This can then serve as a baseline against which a pathogen perturbed system can be compared. Such comparisons can expose both the proteins most susceptible to alteration by the agent as well as the most critical reaction rates to better instill control on a biological network. RESULTS: We explore this through the modeling and simulation of the Fas-mediated apoptotic pathway under normal and SEB influenced conditions. We stimulated human Jurkat cells with an anti-Fas antibody in the presence and absence of SEB and determined the relative levels of seven proteins involved in the core pathway at five time points following exposure. These levels were used to impute relative rate constants and build a quantitative model consisting of a series of ordinary differential equations (ODEs) that simulate the network under both normal and pathogen-influenced conditions. Experimental results show that cells exposed to SEB exhibit an increase in the rate of executioner caspase expression (and subsequently apoptosis) of 1 hour 43 minutes (± 14 minutes), as compared to cells undergoing normal cell death. CONCLUSION: Our model accurately reflects these results and reveals intervention points that can be altered to restore SEB-influenced system dynamics back to levels within the range of normal conditions

    Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model

    Full text link
    Large language models (LLMs) have made significant advancements in natural language processing (NLP). Broad corpora capture diverse patterns but can introduce irrelevance, while focused corpora enhance reliability by reducing misleading information. Training LLMs on focused corpora poses computational challenges. An alternative approach is to use a retrieval-augmentation (RetA) method tested in a specific domain. To evaluate LLM performance, OpenAI's GPT-3, GPT-4, Bing's Prometheus, and a custom RetA model were compared using 19 questions on diffuse large B-cell lymphoma (DLBCL) disease. Eight independent reviewers assessed responses based on accuracy, relevance, and readability (rated 1-3). The RetA model performed best in accuracy (12/19 3-point scores, total=47) and relevance (13/19, 50), followed by GPT-4 (8/19, 43; 11/19, 49). GPT-4 received the highest readability scores (17/19, 55), followed by GPT-3 (15/19, 53) and the RetA model (11/19, 47). Prometheus underperformed in accuracy (34), relevance (32), and readability (38). Both GPT-3.5 and GPT-4 had more hallucinations in all 19 responses compared to the RetA model and Prometheus. Hallucinations were mostly associated with non-existent references or fabricated efficacy data. These findings suggest that RetA models, supplemented with domain-specific corpora, may outperform general-purpose LLMs in accuracy and relevance within specific domains. However, this evaluation was limited to specific questions and metrics and may not capture challenges in semantic search and other NLP tasks. Further research will explore different LLM architectures, RetA methodologies, and evaluation methods to assess strengths and limitations more comprehensively

    PheMaDB: A solution for storage, retrieval, and analysis of high throughput phenotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>OmniLog™ phenotype microarrays (PMs) have the capability to measure and compare the growth responses of biological samples upon exposure to hundreds of growth conditions such as different metabolites and antibiotics over a time course of hours to days. In order to manage the large amount of data produced from the OmniLog™ instrument, PheMaDB (Phenotype Microarray DataBase), a web-based relational database, was designed. PheMaDB enables efficient storage, retrieval and rapid analysis of the OmniLog™ PM data.</p> <p>Description</p> <p>PheMaDB allows the user to quickly identify records of interest for data analysis by filtering with a hierarchical ordering of Project, Strain, Phenotype, Replicate, and Temperature. PheMaDB then provides various statistical analysis options to identify specific growth pattern characteristics of the experimental strains, such as: outlier analysis, negative controls analysis (signal/background calibration), bar plots, pearson's correlation matrix, growth curve profile search, <it>k</it>-means clustering, and a heat map plot. This web-based database management system allows for both easy data sharing among multiple users and robust tools to phenotype organisms of interest.</p> <p>Conclusions</p> <p>PheMaDB is an open source system standardized for OmniLog™ PM data. PheMaDB could facilitate the banking and sharing of phenotype data. The source code is available for download at <url>http://phemadb.sourceforge.net</url>.</p
    corecore